Mixed-Source Multi-Document Speech-to-Text Summarization
نویسندگان
چکیده
Speech-to-text summarization systems usually take as input the output of an automatic speech recognition (ASR) system that is affected by issues like speech recognition errors, disfluencies, or difficulties in the accurate identification of sentence boundaries. We propose the inclusion of related, solid background information to cope with the difficulties of summarizing spoken language and the use of multi-document summarization techniques in single document speechto-text summarization. In this work, we explore the possibilities offered by phonetic information to select the background information and conduct a perceptual evaluation to better assess the relevance of the inclusion of that information. Results show that summaries generated using this approach are considerably better than those produced by an up-to-date latent semantic analysis (LSA) summarization method and suggest that humans prefer summaries restricted to the information conveyed in the input source.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملAn Approach for Concept-based Automatic Multi- Document Summarization using Machine Learning
Text Summarization is compressing the source text into a shorter version preserving its information content and overall meaning. It is very complicated for human beings to manually summarize large documents of text. Text summarization plays an important role in the area of natural language processing and text mining. Many approaches use statistics and machine learning techniques to extract sent...
متن کاملThe Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach
Research in multi-document summarization has focused on newswire corpora since the early beginnings. However, the newswire genre provides genre-specific features such as sentence position which are easy to exploit in summarization systems. Such easy to exploit genre-specific features are available in other genres as well. We therefore present the new hMDS corpus for multi-document summarization...
متن کاملText Summarization Using Cuckoo Search Optimization Algorithm
Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...
متن کاملA Proposed Textual Graph Based Model for Arabic Multi-document Summarization
Text summarization task is still an active area of research in natural language preprocessing. Several methods that have been proposed in the literature to solve this task have presented mixed success. However, such methods developed in a multi-document Arabic text summarization are based on extractive summary and none of them is oriented to abstractive summary. This is due to the challenges of...
متن کامل